Parsing CFGs and PCFGs with a Chomsky-Schützenberger Representation

نویسنده

  • Mans Hulden
چکیده

We present a parsing algorithm for arbitrary context-free and probabilistic context-free grammars based on a representation of such grammars as a combination of a regular grammar and a grammar of balanced parentheses, similar to the representation used in the Chomsky-Schützenberger theorem. The basic algorithm has the same worst-case complexity as the popular CKY and Earley parsing algorithms frequently employed in natural language processing tasks. As natural languages rarely take advantage of the crucial distinguishing feature between regular and context-free languages, that of center embedding, we also investigate methods to speed up parsing at the cost of some overgeneration by forgoing the enforcement of proper nesting of constituents in the algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Chomsky-Schützenberger Representation for Weighted Multiple Context-free Languages

We prove a Chomsky-Schützenberger representation theorem for weighted multiple context-free languages.

متن کامل

Multidimensional trees and a Chomsky-Schützenberger-Weir representation theorem for simple context-free tree grammars

Weir [43] proved a Chomsky-Schützenberger-like representation theorem for the string languages of tree-adjoining grammars, where the Dyck language Dn in the Chomsky-Schützenberger characterization is replaced by the intersection D2n ∩ g(D2n), where g is a certain bijection on the alphabet consisting of 2n pairs of brackets. This paper presents a generalization of this theorem to the string lang...

متن کامل

Inferring (k, l)-context-sensitive probabilistic context-free grammars using hierarchical Pitman-Yor processes

Motivated by the idea of applying nonparametric Bayesian models to dual approaches for distributional learning, we define (k, l)-context-sensitive probabilistic context-free grammars (PCFGs) using hierarchical Pitman-Yor processes (PYPs). The data sparseness problem that occurs when inferring context-sensitive probabilities for rules is handled by the smoothing effect of hierarchical PYPs. Many...

متن کامل

Probabilistic Context-free Grammars in Natural Language Processing

Context-free grammars (CFGs) are a class of formal grammars that have found numerous applications in modeling computer languages. A probabilistic form of CFG, the probabilistic CFG (PCFG), has also been successfully applied to model natural languages. In this paper, we discuss the use of PCFGs in natural language modeling. We develop PCFGs as a natural extension of the CFGs and explain one prob...

متن کامل

Combining Labeled and Unlabeled Data in Statistical Natural Language Parsing

COMBINING LABELED AND UNLABELED DATA IN STATISTICAL NATURAL LANGUAGE PARSING Anoop Sarkar Supervisor: Professor Aravind Joshi Ambiguity resolution in the parsing of natural language requires a vast repository of knowledge to guide disambiguation. An effective approach to this problem is to use machine learning algorithms to acquire the needed knowledge and to extract generalizations about disam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009